The Start Of It All...

Libraries

In [1]:
#Change the width of the notebook
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))
In [2]:
#Standard Libraries
import numpy as np
import pandas as pd
from IPython.display import display # Allows the use of display() for DataFrames
import random
from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype
#User counter to check frequencies (https://pymotw.com/2/collections/counter.html)
from  collections import Counter
In [3]:
#Visualisation Libraries
import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
from pandas.plotting import scatter_matrix
import seaborn as sns
# Pretty display for notebooks
%matplotlib inline
# Import supplementary visualizations code visuals.py
import visuals as vs
In [4]:
#Modelling Libraries
from sklearn.model_selection import train_test_split,cross_val_score
from sklearn import tree
from sklearn.metrics import fbeta_score,accuracy_score
from sklearn.preprocessing import MinMaxScaler
from scipy.stats import boxcox
from sklearn.decomposition import PCA
from sklearn.cluster import DBSCAN, KMeans
from sklearn.metrics import silhouette_score
from sklearn.mixture import GaussianMixture
from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs
from sklearn.preprocessing import StandardScaler

Data

In [5]:
sampleMode = 1
In [6]:
if sampleMode == 1:
    file = "sample_model_data_protected_small5k.csv"
else:
    file = "sample_model_data_protected_60k.csv"
# Load Company IP Intent Data (March Sample)
try:
    full_data = pd.read_csv(file)
    print("Company IP Intent dataset has {} samples with {} features each.".format(*full_data.shape))
except:
    print("Dataset could not be loaded!")

print("Do not use excel UTF-8 format for csv, it breaks stuff...")
Company IP Intent dataset has 5099 samples with 46 features each.
Do not use excel UTF-8 format for csv, it breaks stuff...

Take a random sample for modelling due to size

In [7]:
def getRandomIDXs(n,d):
    N=len(d)
    indices=[]
    for i in range(0, n):
        randomInt = (random.randint(0, N))
        if randomInt not in indices:
            indices.append(randomInt)
    print("Sample indices selected - ", str(len(indices)))
    return indices

def getRandomSample(n,d):
    N=len(d)
    n=round(n*N)
    indices = getRandomIDXs(n,d)
    sample = pd.DataFrame(d.reindex(indices), columns = d.keys()).reset_index(drop = True)
    #sample = pd.DataFrame(d.loc[indices], columns = d.keys()).reset_index(drop = True)
    print(sample.shape)
    return sample
In [8]:
#Take a sample due to the size of the dataset here
data = getRandomSample(1.0,full_data)
#data = full_data
print("Chosen samples of Company Intent customers dataset:",str(len(data)))
data[:3]
Sample indices selected -  3229
(3229, 46)
Chosen samples of Company Intent customers dataset: 3229
Out[8]:
acct_id hq_id company userDomain city region countryCode revenue_mil_usd total_employees naic3 ... hits_iabCat_IAB_17 hits_iabCat_IAB_18 hits_iabCat_IAB_19 hits_iabCat_IAB_20 hits_iabCat_IAB_21 hits_iabCat_IAB_22 hits_iabCat_IAB_23 hits_iabCat_IAB_24 hits_iabCat_IAB_25 hits_iabCat_IAB_26
0 1-10NCV05 1-10NCV05 REDACTED REDACTED Vaestervik Kalmar lan SE 188.7810 3000.0 623.0 ... 16.0 0.0 82.0 0.0 0.0 35.0 0.0 0.0 0.0 0.0
1 1-10426BT 1-10426BT REDACTED REDACTED Allentown Pennsylvania US 6.9941 50.0 541.0 ... 2.0 4.0 45.0 0.0 2.0 9.0 0.0 1.0 0.0 0.0
2 1-10DADIG 1-GHL4HH REDACTED REDACTED Malone New York US 1812.8530 30000.0 721.0 ... 4.0 5.0 44.0 0.0 0.0 1.0 0.0 3.0 0.0 5.0

3 rows × 46 columns

In [9]:
data.dtypes
Out[9]:
acct_id                object
hq_id                  object
company                object
userDomain             object
city                   object
region                 object
countryCode            object
revenue_mil_usd       float64
total_employees       float64
naic3                 float64
naic6                 float64
isISP                 float64
usageType              object
datesCount            float64
domainsCount          float64
hitsSum               float64
pageViewsSum          float64
uniqueViewsSum        float64
clicks                float64
clickDates            float64
hits_iabCat_IAB_1     float64
hits_iabCat_IAB_2     float64
hits_iabCat_IAB_3     float64
hits_iabCat_IAB_4     float64
hits_iabCat_IAB_5     float64
hits_iabCat_IAB_6     float64
hits_iabCat_IAB_7     float64
hits_iabCat_IAB_8     float64
hits_iabCat_IAB_9     float64
hits_iabCat_IAB_10    float64
hits_iabCat_IAB_11    float64
hits_iabCat_IAB_12    float64
hits_iabCat_IAB_13    float64
hits_iabCat_IAB_14    float64
hits_iabCat_IAB_15    float64
hits_iabCat_IAB_16    float64
hits_iabCat_IAB_17    float64
hits_iabCat_IAB_18    float64
hits_iabCat_IAB_19    float64
hits_iabCat_IAB_20    float64
hits_iabCat_IAB_21    float64
hits_iabCat_IAB_22    float64
hits_iabCat_IAB_23    float64
hits_iabCat_IAB_24    float64
hits_iabCat_IAB_25    float64
hits_iabCat_IAB_26    float64
dtype: object
In [10]:
#Null fill values to prevent NaN issues
def zeroFillColumn(d):
    cols = d.columns
    for col in cols:
        if is_numeric_dtype(d[col]):
            d[col].fillna(0, inplace = True)
        else:
            d[col].fillna("Unknown", inplace = True)
    return d
data = zeroFillColumn(data)

Data Exploration

In this sections I will attempt to find correlations in my data with summary statistics and visualations to help guide the supervised learning

Investigate the Reliability of the Columns

This dataset has been derived from a production system which may have too sparse datapoints for certain metrics OR just too unreliable due to poor matching between disparate datasets

Flag and remove early on for performance reasons mainly

In [11]:
# Display a description of the dataset
t=data.describe()
t.to_csv('summary_statistics.csv')
data.describe()
Out[11]:
revenue_mil_usd total_employees naic3 naic6 isISP datesCount domainsCount hitsSum pageViewsSum uniqueViewsSum ... hits_iabCat_IAB_17 hits_iabCat_IAB_18 hits_iabCat_IAB_19 hits_iabCat_IAB_20 hits_iabCat_IAB_21 hits_iabCat_IAB_22 hits_iabCat_IAB_23 hits_iabCat_IAB_24 hits_iabCat_IAB_25 hits_iabCat_IAB_26
count 3.229000e+03 3229.000000 3229.000000 3229.000000 3229.000000 3229.000000 3229.000000 3229.000000 3229.000000 3229.000000 ... 3229.000000 3229.000000 3229.000000 3229.000000 3229.000000 3229.000000 3229.000000 3229.000000 3229.000000 3229.000000
mean 3.352560e+04 9117.914525 468.347166 468655.573862 0.002478 11.990709 42.624342 390.220192 1331.291421 606.606070 ... 8.375039 6.105915 94.045835 1.059151 3.309074 48.445339 0.132858 2.050480 0.000310 2.628678
std 3.970458e+05 42580.925328 197.167294 197183.887208 0.049721 8.394024 131.456719 3497.179011 13178.931135 5334.439119 ... 62.682351 58.100380 918.717257 8.784028 33.904887 627.382010 1.096068 15.693537 0.017598 22.540338
min 0.000000e+00 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 7.566200e+00 60.000000 332.000000 332312.000000 0.000000 5.000000 5.000000 11.000000 20.000000 15.000000 ... 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
50% 1.831860e+01 95.000000 484.000000 484121.000000 0.000000 11.000000 13.000000 35.000000 83.000000 53.000000 ... 0.000000 0.000000 6.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
75% 7.547742e+01 510.000000 611.000000 611310.000000 0.000000 18.000000 34.000000 114.000000 330.000000 189.000000 ... 3.000000 3.000000 26.000000 0.000000 1.000000 6.000000 0.000000 0.000000 0.000000 0.000000
max 5.057748e+06 570000.000000 928.000000 928120.000000 1.000000 31.000000 3680.000000 108392.000000 423079.000000 153794.000000 ... 2066.000000 2140.000000 31710.000000 335.000000 1368.000000 31158.000000 32.000000 524.000000 1.000000 690.000000

8 rows × 38 columns

  • Ignore NAICS codes.

  • hitsSum/pageViewsSum/uniqueViewsSum/most of the iabCat hits - Massively varying between accounts. Highlighy right skewed, with the std way over the 75th percentile

  • Alot of iabCat hits with 0 as the values for everything below the 75th percentile. Likely needs to be removed from the dataset as no significant inferences can be made if only a few companies have hits for the certain categories
In [12]:
#Group columns together for analysis
iabCat_hits_cols = [col for col in data.columns if 'hits_iabCat' in col]
summaryMetricCols = ['datesCount', 'domainsCount', 'hitsSum','pageViewsSum', 'uniqueViewsSum', 'clicks', 'clickDates']
labelCols = ['acct_id','hq_id','company','userDomain','city','region','countryCode','revenue_mil_usd','total_employees']
coninuousCols = iabCat_hits_cols + summaryMetricCols + ['revenue_mil_usd','total_employees']
In [13]:
boxplot = data.boxplot(column=summaryMetricCols,figsize=(15,15))
In [14]:
boxplot = data.boxplot(column=iabCat_hits_cols,figsize=(15,15))

Remove outlier columns early on

In [15]:
#Get Outliers IDXs per feature column
def getOutLierIDXs(d):
    #Intialise counters
    outliers = []
    featureDict = {}
    typeDict = data.columns.to_series().groupby(data.dtypes).groups
    # For each feature find the data points with extreme high or low values
    for feature in d.keys():
        #Break early if not a number value
        if is_numeric_dtype(d[feature]):
            #Calculate Q1 (25th percentile of the data) for the given feature
            Q1 = np.percentile(d[feature],25)
            #Calculate Q3 (75th percentile of the data) for the given feature
            Q3 = np.percentile(d[feature],75)
            #Use the interquartile range to calculate an outlier step (1.5 times the interquartile range)
            step = (Q3 - Q1)*1.5
            #Outlier if value step below lower quartile OR step above upper quartile
            featureOutliers = d[(d[feature] <= Q1 - step) | (d[feature] >= Q3 + step)]
            outlierIDXs = featureOutliers.index.tolist()
            featureDict[feature] = len(featureOutliers)
            print("Number of outliers detected for feature - " + str(feature) + " = " +str(len(featureOutliers)))
            #Add to global list of outliersIDXs
            outliers.extend(outlierIDXs)
    return (outliers,featureDict)
In [16]:
#Remove any columns where all values are classed as outulier, a good few iabCat columns
res = getOutLierIDXs(data)
outliers = res[0]
featureDict = res[1]

#Clean dataset
datac = data.copy()
badBoys = []
for feature in featureDict.keys():
    if (featureDict[feature]) == len(datac):
        badBoys.append(feature)
        print(str(feature) + "- Feature added to blacklist")
#Drop Bad columns
for feature in badBoys:
    datac.drop([feature], axis = 1, inplace = True)
Number of outliers detected for feature - revenue_mil_usd = 579
Number of outliers detected for feature - total_employees = 563
Number of outliers detected for feature - naic3 = 0
Number of outliers detected for feature - naic6 = 0
Number of outliers detected for feature - isISP = 3229
Number of outliers detected for feature - datesCount = 0
Number of outliers detected for feature - domainsCount = 356
Number of outliers detected for feature - hitsSum = 427
Number of outliers detected for feature - pageViewsSum = 450
Number of outliers detected for feature - uniqueViewsSum = 403
Number of outliers detected for feature - clicks = 3229
Number of outliers detected for feature - clickDates = 3229
Number of outliers detected for feature - hits_iabCat_IAB_1 = 428
Number of outliers detected for feature - hits_iabCat_IAB_2 = 464
Number of outliers detected for feature - hits_iabCat_IAB_3 = 483
Number of outliers detected for feature - hits_iabCat_IAB_4 = 494
Number of outliers detected for feature - hits_iabCat_IAB_5 = 572
Number of outliers detected for feature - hits_iabCat_IAB_6 = 3229
Number of outliers detected for feature - hits_iabCat_IAB_7 = 520
Number of outliers detected for feature - hits_iabCat_IAB_8 = 3229
Number of outliers detected for feature - hits_iabCat_IAB_9 = 435
Number of outliers detected for feature - hits_iabCat_IAB_10 = 584
Number of outliers detected for feature - hits_iabCat_IAB_11 = 3229
Number of outliers detected for feature - hits_iabCat_IAB_12 = 441
Number of outliers detected for feature - hits_iabCat_IAB_13 = 450
Number of outliers detected for feature - hits_iabCat_IAB_14 = 3229
Number of outliers detected for feature - hits_iabCat_IAB_15 = 501
Number of outliers detected for feature - hits_iabCat_IAB_16 = 3229
Number of outliers detected for feature - hits_iabCat_IAB_17 = 457
Number of outliers detected for feature - hits_iabCat_IAB_18 = 333
Number of outliers detected for feature - hits_iabCat_IAB_19 = 418
Number of outliers detected for feature - hits_iabCat_IAB_20 = 3229
Number of outliers detected for feature - hits_iabCat_IAB_21 = 449
Number of outliers detected for feature - hits_iabCat_IAB_22 = 538
Number of outliers detected for feature - hits_iabCat_IAB_23 = 3229
Number of outliers detected for feature - hits_iabCat_IAB_24 = 3229
Number of outliers detected for feature - hits_iabCat_IAB_25 = 3229
Number of outliers detected for feature - hits_iabCat_IAB_26 = 3229
isISP- Feature added to blacklist
clicks- Feature added to blacklist
clickDates- Feature added to blacklist
hits_iabCat_IAB_6- Feature added to blacklist
hits_iabCat_IAB_8- Feature added to blacklist
hits_iabCat_IAB_11- Feature added to blacklist
hits_iabCat_IAB_14- Feature added to blacklist
hits_iabCat_IAB_16- Feature added to blacklist
hits_iabCat_IAB_20- Feature added to blacklist
hits_iabCat_IAB_23- Feature added to blacklist
hits_iabCat_IAB_24- Feature added to blacklist
hits_iabCat_IAB_25- Feature added to blacklist
hits_iabCat_IAB_26- Feature added to blacklist
In [17]:
#Remove outlier function
def getMultiClassOutliers(d):
    #Get outlier indexes per column
    res = getOutLierIDXs(d)
    outliers = res[0]
    featureDict = res[1]
    #Capture multi-feature outliers
    multiOutliers=[]
    countsDict = dict(Counter(outliers))
    for idx in list(countsDict.keys()):
        if countsDict[idx] > 1:
            multiOutliers.append(idx)
            print('Multi-Feature outlier found at index = ' + str(idx))
    print('Total multi-Feature outliers found = ' + str(len(multiOutliers)) + ' out of sample of ' + str(len(d)))
    return multiOutliers
In [18]:
#Re-run outlier detection and remove multi-feature outliers
outliersN = getMultiClassOutliers(datac)
Number of outliers detected for feature - revenue_mil_usd = 579
Number of outliers detected for feature - total_employees = 563
Number of outliers detected for feature - naic3 = 0
Number of outliers detected for feature - naic6 = 0
Number of outliers detected for feature - datesCount = 0
Number of outliers detected for feature - domainsCount = 356
Number of outliers detected for feature - hitsSum = 427
Number of outliers detected for feature - pageViewsSum = 450
Number of outliers detected for feature - uniqueViewsSum = 403
Number of outliers detected for feature - hits_iabCat_IAB_1 = 428
Number of outliers detected for feature - hits_iabCat_IAB_2 = 464
Number of outliers detected for feature - hits_iabCat_IAB_3 = 483
Number of outliers detected for feature - hits_iabCat_IAB_4 = 494
Number of outliers detected for feature - hits_iabCat_IAB_5 = 572
Number of outliers detected for feature - hits_iabCat_IAB_7 = 520
Number of outliers detected for feature - hits_iabCat_IAB_9 = 435
Number of outliers detected for feature - hits_iabCat_IAB_10 = 584
Number of outliers detected for feature - hits_iabCat_IAB_12 = 441
Number of outliers detected for feature - hits_iabCat_IAB_13 = 450
Number of outliers detected for feature - hits_iabCat_IAB_15 = 501
Number of outliers detected for feature - hits_iabCat_IAB_17 = 457
Number of outliers detected for feature - hits_iabCat_IAB_18 = 333
Number of outliers detected for feature - hits_iabCat_IAB_19 = 418
Number of outliers detected for feature - hits_iabCat_IAB_21 = 449
Number of outliers detected for feature - hits_iabCat_IAB_22 = 538
Multi-Feature outlier found at index = 0
Multi-Feature outlier found at index = 2
Multi-Feature outlier found at index = 5
Multi-Feature outlier found at index = 14
Multi-Feature outlier found at index = 25
Multi-Feature outlier found at index = 28
Multi-Feature outlier found at index = 30
Multi-Feature outlier found at index = 34
Multi-Feature outlier found at index = 39
Multi-Feature outlier found at index = 40
Multi-Feature outlier found at index = 43
Multi-Feature outlier found at index = 55
Multi-Feature outlier found at index = 89
Multi-Feature outlier found at index = 95
Multi-Feature outlier found at index = 103
Multi-Feature outlier found at index = 105
Multi-Feature outlier found at index = 109
Multi-Feature outlier found at index = 110
Multi-Feature outlier found at index = 118
Multi-Feature outlier found at index = 120
Multi-Feature outlier found at index = 122
Multi-Feature outlier found at index = 128
Multi-Feature outlier found at index = 134
Multi-Feature outlier found at index = 145
Multi-Feature outlier found at index = 146
Multi-Feature outlier found at index = 148
Multi-Feature outlier found at index = 155
Multi-Feature outlier found at index = 157
Multi-Feature outlier found at index = 159
Multi-Feature outlier found at index = 160
Multi-Feature outlier found at index = 163
Multi-Feature outlier found at index = 173
Multi-Feature outlier found at index = 178
Multi-Feature outlier found at index = 189
Multi-Feature outlier found at index = 190
Multi-Feature outlier found at index = 195
Multi-Feature outlier found at index = 200
Multi-Feature outlier found at index = 204
Multi-Feature outlier found at index = 213
Multi-Feature outlier found at index = 217
Multi-Feature outlier found at index = 241
Multi-Feature outlier found at index = 243
Multi-Feature outlier found at index = 248
Multi-Feature outlier found at index = 255
Multi-Feature outlier found at index = 261
Multi-Feature outlier found at index = 266
Multi-Feature outlier found at index = 271
Multi-Feature outlier found at index = 300
Multi-Feature outlier found at index = 301
Multi-Feature outlier found at index = 308
Multi-Feature outlier found at index = 311
Multi-Feature outlier found at index = 316
Multi-Feature outlier found at index = 320
Multi-Feature outlier found at index = 339
Multi-Feature outlier found at index = 346
Multi-Feature outlier found at index = 353
Multi-Feature outlier found at index = 357
Multi-Feature outlier found at index = 366
Multi-Feature outlier found at index = 378
Multi-Feature outlier found at index = 388
Multi-Feature outlier found at index = 394
Multi-Feature outlier found at index = 400
Multi-Feature outlier found at index = 402
Multi-Feature outlier found at index = 404
Multi-Feature outlier found at index = 409
Multi-Feature outlier found at index = 411
Multi-Feature outlier found at index = 414
Multi-Feature outlier found at index = 428
Multi-Feature outlier found at index = 431
Multi-Feature outlier found at index = 447
Multi-Feature outlier found at index = 449
Multi-Feature outlier found at index = 450
Multi-Feature outlier found at index = 456
Multi-Feature outlier found at index = 460
Multi-Feature outlier found at index = 469
Multi-Feature outlier found at index = 483
Multi-Feature outlier found at index = 486
Multi-Feature outlier found at index = 492
Multi-Feature outlier found at index = 499
Multi-Feature outlier found at index = 503
Multi-Feature outlier found at index = 511
Multi-Feature outlier found at index = 515
Multi-Feature outlier found at index = 527
Multi-Feature outlier found at index = 538
Multi-Feature outlier found at index = 542
Multi-Feature outlier found at index = 554
Multi-Feature outlier found at index = 559
Multi-Feature outlier found at index = 575
Multi-Feature outlier found at index = 581
Multi-Feature outlier found at index = 592
Multi-Feature outlier found at index = 597
Multi-Feature outlier found at index = 603
Multi-Feature outlier found at index = 605
Multi-Feature outlier found at index = 606
Multi-Feature outlier found at index = 609
Multi-Feature outlier found at index = 618
Multi-Feature outlier found at index = 620
Multi-Feature outlier found at index = 628
Multi-Feature outlier found at index = 631
Multi-Feature outlier found at index = 635
Multi-Feature outlier found at index = 643
Multi-Feature outlier found at index = 647
Multi-Feature outlier found at index = 657
Multi-Feature outlier found at index = 675
Multi-Feature outlier found at index = 676
Multi-Feature outlier found at index = 693
Multi-Feature outlier found at index = 694
Multi-Feature outlier found at index = 706
Multi-Feature outlier found at index = 707
Multi-Feature outlier found at index = 712
Multi-Feature outlier found at index = 718
Multi-Feature outlier found at index = 719
Multi-Feature outlier found at index = 729
Multi-Feature outlier found at index = 732
Multi-Feature outlier found at index = 742
Multi-Feature outlier found at index = 744
Multi-Feature outlier found at index = 753
Multi-Feature outlier found at index = 754
Multi-Feature outlier found at index = 755
Multi-Feature outlier found at index = 762
Multi-Feature outlier found at index = 773
Multi-Feature outlier found at index = 779
Multi-Feature outlier found at index = 780
Multi-Feature outlier found at index = 795
Multi-Feature outlier found at index = 798
Multi-Feature outlier found at index = 800
Multi-Feature outlier found at index = 813
Multi-Feature outlier found at index = 825
Multi-Feature outlier found at index = 830
Multi-Feature outlier found at index = 837
Multi-Feature outlier found at index = 842
Multi-Feature outlier found at index = 848
Multi-Feature outlier found at index = 860
Multi-Feature outlier found at index = 865
Multi-Feature outlier found at index = 871
Multi-Feature outlier found at index = 890
Multi-Feature outlier found at index = 896
Multi-Feature outlier found at index = 909
Multi-Feature outlier found at index = 914
Multi-Feature outlier found at index = 917
Multi-Feature outlier found at index = 922
Multi-Feature outlier found at index = 923
Multi-Feature outlier found at index = 925
Multi-Feature outlier found at index = 928
Multi-Feature outlier found at index = 933
Multi-Feature outlier found at index = 940
Multi-Feature outlier found at index = 941
Multi-Feature outlier found at index = 943
Multi-Feature outlier found at index = 949
Multi-Feature outlier found at index = 952
Multi-Feature outlier found at index = 967
Multi-Feature outlier found at index = 970
Multi-Feature outlier found at index = 972
Multi-Feature outlier found at index = 976
Multi-Feature outlier found at index = 982
Multi-Feature outlier found at index = 995
Multi-Feature outlier found at index = 997
Multi-Feature outlier found at index = 998
Multi-Feature outlier found at index = 1002
Multi-Feature outlier found at index = 1014
Multi-Feature outlier found at index = 1015
Multi-Feature outlier found at index = 1023
Multi-Feature outlier found at index = 1026
Multi-Feature outlier found at index = 1027
Multi-Feature outlier found at index = 1032
Multi-Feature outlier found at index = 1033
Multi-Feature outlier found at index = 1045
Multi-Feature outlier found at index = 1049
Multi-Feature outlier found at index = 1050
Multi-Feature outlier found at index = 1054
Multi-Feature outlier found at index = 1060
Multi-Feature outlier found at index = 1062
Multi-Feature outlier found at index = 1070
Multi-Feature outlier found at index = 1071
Multi-Feature outlier found at index = 1072
Multi-Feature outlier found at index = 1076
Multi-Feature outlier found at index = 1083
Multi-Feature outlier found at index = 1090
Multi-Feature outlier found at index = 1095
Multi-Feature outlier found at index = 1100
Multi-Feature outlier found at index = 1105
Multi-Feature outlier found at index = 1109
Multi-Feature outlier found at index = 1110
Multi-Feature outlier found at index = 1117
Multi-Feature outlier found at index = 1139
Multi-Feature outlier found at index = 1140
Multi-Feature outlier found at index = 1147
Multi-Feature outlier found at index = 1154
Multi-Feature outlier found at index = 1158
Multi-Feature outlier found at index = 1165
Multi-Feature outlier found at index = 1187
Multi-Feature outlier found at index = 1203
Multi-Feature outlier found at index = 1204
Multi-Feature outlier found at index = 1217
Multi-Feature outlier found at index = 1226
Multi-Feature outlier found at index = 1228
Multi-Feature outlier found at index = 1230
Multi-Feature outlier found at index = 1232
Multi-Feature outlier found at index = 1238
Multi-Feature outlier found at index = 1240
Multi-Feature outlier found at index = 1244
Multi-Feature outlier found at index = 1250
Multi-Feature outlier found at index = 1258
Multi-Feature outlier found at index = 1273
Multi-Feature outlier found at index = 1274
Multi-Feature outlier found at index = 1275
Multi-Feature outlier found at index = 1277
Multi-Feature outlier found at index = 1281
Multi-Feature outlier found at index = 1287
Multi-Feature outlier found at index = 1290
Multi-Feature outlier found at index = 1295
Multi-Feature outlier found at index = 1299
Multi-Feature outlier found at index = 1302
Multi-Feature outlier found at index = 1304
Multi-Feature outlier found at index = 1317
Multi-Feature outlier found at index = 1325
Multi-Feature outlier found at index = 1329
Multi-Feature outlier found at index = 1357
Multi-Feature outlier found at index = 1361
Multi-Feature outlier found at index = 1362
Multi-Feature outlier found at index = 1376
Multi-Feature outlier found at index = 1379
Multi-Feature outlier found at index = 1380
Multi-Feature outlier found at index = 1382
Multi-Feature outlier found at index = 1387
Multi-Feature outlier found at index = 1388
Multi-Feature outlier found at index = 1396
Multi-Feature outlier found at index = 1405
Multi-Feature outlier found at index = 1410
Multi-Feature outlier found at index = 1420
Multi-Feature outlier found at index = 1427
Multi-Feature outlier found at index = 1430
Multi-Feature outlier found at index = 1440
Multi-Feature outlier found at index = 1442
Multi-Feature outlier found at index = 1443
Multi-Feature outlier found at index = 1453
Multi-Feature outlier found at index = 1455
Multi-Feature outlier found at index = 1459
Multi-Feature outlier found at index = 1466
Multi-Feature outlier found at index = 1472
Multi-Feature outlier found at index = 1474
Multi-Feature outlier found at index = 1485
Multi-Feature outlier found at index = 1500
Multi-Feature outlier found at index = 1501
Multi-Feature outlier found at index = 1509
Multi-Feature outlier found at index = 1513
Multi-Feature outlier found at index = 1514
Multi-Feature outlier found at index = 1515
Multi-Feature outlier found at index = 1526
Multi-Feature outlier found at index = 1528
Multi-Feature outlier found at index = 1530
Multi-Feature outlier found at index = 1531
Multi-Feature outlier found at index = 1535
Multi-Feature outlier found at index = 1537
Multi-Feature outlier found at index = 1542
Multi-Feature outlier found at index = 1545
Multi-Feature outlier found at index = 1550
Multi-Feature outlier found at index = 1556
Multi-Feature outlier found at index = 1559
Multi-Feature outlier found at index = 1568
Multi-Feature outlier found at index = 1570
Multi-Feature outlier found at index = 1584
Multi-Feature outlier found at index = 1591
Multi-Feature outlier found at index = 1595
Multi-Feature outlier found at index = 1601
Multi-Feature outlier found at index = 1604
Multi-Feature outlier found at index = 1609
Multi-Feature outlier found at index = 1610
Multi-Feature outlier found at index = 1615
Multi-Feature outlier found at index = 1623
Multi-Feature outlier found at index = 1624
Multi-Feature outlier found at index = 1626
Multi-Feature outlier found at index = 1639
Multi-Feature outlier found at index = 1647
Multi-Feature outlier found at index = 1648
Multi-Feature outlier found at index = 1649
Multi-Feature outlier found at index = 1657
Multi-Feature outlier found at index = 1665
Multi-Feature outlier found at index = 1669
Multi-Feature outlier found at index = 1679
Multi-Feature outlier found at index = 1684
Multi-Feature outlier found at index = 1686
Multi-Feature outlier found at index = 1695
Multi-Feature outlier found at index = 1707
Multi-Feature outlier found at index = 1710
Multi-Feature outlier found at index = 1715
Multi-Feature outlier found at index = 1717
Multi-Feature outlier found at index = 1746
Multi-Feature outlier found at index = 1758
Multi-Feature outlier found at index = 1762
Multi-Feature outlier found at index = 1764
Multi-Feature outlier found at index = 1766
Multi-Feature outlier found at index = 1767
Multi-Feature outlier found at index = 1773
Multi-Feature outlier found at index = 1778
Multi-Feature outlier found at index = 1781
Multi-Feature outlier found at index = 1787
Multi-Feature outlier found at index = 1788
Multi-Feature outlier found at index = 1790
Multi-Feature outlier found at index = 1796
Multi-Feature outlier found at index = 1798
Multi-Feature outlier found at index = 1799
Multi-Feature outlier found at index = 1801
Multi-Feature outlier found at index = 1805
Multi-Feature outlier found at index = 1809
Multi-Feature outlier found at index = 1811
Multi-Feature outlier found at index = 1813
Multi-Feature outlier found at index = 1814
Multi-Feature outlier found at index = 1840
Multi-Feature outlier found at index = 1843
Multi-Feature outlier found at index = 1845
Multi-Feature outlier found at index = 1849
Multi-Feature outlier found at index = 1855
Multi-Feature outlier found at index = 1856
Multi-Feature outlier found at index = 1857
Multi-Feature outlier found at index = 1861
Multi-Feature outlier found at index = 1870
Multi-Feature outlier found at index = 1872
Multi-Feature outlier found at index = 1873
Multi-Feature outlier found at index = 1876
Multi-Feature outlier found at index = 1877
Multi-Feature outlier found at index = 1878
Multi-Feature outlier found at index = 1894
Multi-Feature outlier found at index = 1920
Multi-Feature outlier found at index = 1947
Multi-Feature outlier found at index = 1966
Multi-Feature outlier found at index = 1982
Multi-Feature outlier found at index = 1983
Multi-Feature outlier found at index = 1988
Multi-Feature outlier found at index = 2017
Multi-Feature outlier found at index = 2020
Multi-Feature outlier found at index = 2022
Multi-Feature outlier found at index = 2026
Multi-Feature outlier found at index = 2044
Multi-Feature outlier found at index = 2064
Multi-Feature outlier found at index = 2068
Multi-Feature outlier found at index = 2072
Multi-Feature outlier found at index = 2081
Multi-Feature outlier found at index = 2099
Multi-Feature outlier found at index = 2103
Multi-Feature outlier found at index = 2106
Multi-Feature outlier found at index = 2107
Multi-Feature outlier found at index = 2112
Multi-Feature outlier found at index = 2113
Multi-Feature outlier found at index = 2131
Multi-Feature outlier found at index = 2136
Multi-Feature outlier found at index = 2139
Multi-Feature outlier found at index = 2142
Multi-Feature outlier found at index = 2155
Multi-Feature outlier found at index = 2158
Multi-Feature outlier found at index = 2162
Multi-Feature outlier found at index = 2170
Multi-Feature outlier found at index = 2172
Multi-Feature outlier found at index = 2174
Multi-Feature outlier found at index = 2175
Multi-Feature outlier found at index = 2177
Multi-Feature outlier found at index = 2184
Multi-Feature outlier found at index = 2187
Multi-Feature outlier found at index = 2190
Multi-Feature outlier found at index = 2191
Multi-Feature outlier found at index = 2210
Multi-Feature outlier found at index = 2212
Multi-Feature outlier found at index = 2228
Multi-Feature outlier found at index = 2244
Multi-Feature outlier found at index = 2250
Multi-Feature outlier found at index = 2258
Multi-Feature outlier found at index = 2261
Multi-Feature outlier found at index = 2266
Multi-Feature outlier found at index = 2273
Multi-Feature outlier found at index = 2279
Multi-Feature outlier found at index = 2285
Multi-Feature outlier found at index = 2288
Multi-Feature outlier found at index = 2290
Multi-Feature outlier found at index = 2296
Multi-Feature outlier found at index = 2298
Multi-Feature outlier found at index = 2300
Multi-Feature outlier found at index = 2306
Multi-Feature outlier found at index = 2309
Multi-Feature outlier found at index = 2311
Multi-Feature outlier found at index = 2326
Multi-Feature outlier found at index = 2329
Multi-Feature outlier found at index = 2350
Multi-Feature outlier found at index = 2353
Multi-Feature outlier found at index = 2358
Multi-Feature outlier found at index = 2362
Multi-Feature outlier found at index = 2368
Multi-Feature outlier found at index = 2369
Multi-Feature outlier found at index = 2370
Multi-Feature outlier found at index = 2371
Multi-Feature outlier found at index = 2379
Multi-Feature outlier found at index = 2385
Multi-Feature outlier found at index = 2389
Multi-Feature outlier found at index = 2394
Multi-Feature outlier found at index = 2395
Multi-Feature outlier found at index = 2397
Multi-Feature outlier found at index = 2411
Multi-Feature outlier found at index = 2412
Multi-Feature outlier found at index = 2417
Multi-Feature outlier found at index = 2421
Multi-Feature outlier found at index = 2435
Multi-Feature outlier found at index = 2439
Multi-Feature outlier found at index = 2440
Multi-Feature outlier found at index = 2462
Multi-Feature outlier found at index = 2466
Multi-Feature outlier found at index = 2468
Multi-Feature outlier found at index = 2477
Multi-Feature outlier found at index = 2478
Multi-Feature outlier found at index = 2482
Multi-Feature outlier found at index = 2485
Multi-Feature outlier found at index = 2498
Multi-Feature outlier found at index = 2510
Multi-Feature outlier found at index = 2515
Multi-Feature outlier found at index = 2517
Multi-Feature outlier found at index = 2523
Multi-Feature outlier found at index = 2527
Multi-Feature outlier found at index = 2531
Multi-Feature outlier found at index = 2537
Multi-Feature outlier found at index = 2547
Multi-Feature outlier found at index = 2559
Multi-Feature outlier found at index = 2562
Multi-Feature outlier found at index = 2565
Multi-Feature outlier found at index = 2566
Multi-Feature outlier found at index = 2575
Multi-Feature outlier found at index = 2576
Multi-Feature outlier found at index = 2577
Multi-Feature outlier found at index = 2578
Multi-Feature outlier found at index = 2586
Multi-Feature outlier found at index = 2589
Multi-Feature outlier found at index = 2590
Multi-Feature outlier found at index = 2594
Multi-Feature outlier found at index = 2595
Multi-Feature outlier found at index = 2598
Multi-Feature outlier found at index = 2600
Multi-Feature outlier found at index = 2613
Multi-Feature outlier found at index = 2616
Multi-Feature outlier found at index = 2621
Multi-Feature outlier found at index = 2625
Multi-Feature outlier found at index = 2626
Multi-Feature outlier found at index = 2632
Multi-Feature outlier found at index = 2633
Multi-Feature outlier found at index = 2646
Multi-Feature outlier found at index = 2649
Multi-Feature outlier found at index = 2650
Multi-Feature outlier found at index = 2654
Multi-Feature outlier found at index = 2655
Multi-Feature outlier found at index = 2660
Multi-Feature outlier found at index = 2667
Multi-Feature outlier found at index = 2670
Multi-Feature outlier found at index = 2680
Multi-Feature outlier found at index = 2684
Multi-Feature outlier found at index = 2693
Multi-Feature outlier found at index = 2705
Multi-Feature outlier found at index = 2706
Multi-Feature outlier found at index = 2708
Multi-Feature outlier found at index = 2709
Multi-Feature outlier found at index = 2712
Multi-Feature outlier found at index = 2713
Multi-Feature outlier found at index = 2720
Multi-Feature outlier found at index = 2721
Multi-Feature outlier found at index = 2728
Multi-Feature outlier found at index = 2730
Multi-Feature outlier found at index = 2736
Multi-Feature outlier found at index = 2745
Multi-Feature outlier found at index = 2759
Multi-Feature outlier found at index = 2760
Multi-Feature outlier found at index = 2768
Multi-Feature outlier found at index = 2775
Multi-Feature outlier found at index = 2788
Multi-Feature outlier found at index = 2797
Multi-Feature outlier found at index = 2801
Multi-Feature outlier found at index = 2807
Multi-Feature outlier found at index = 2808
Multi-Feature outlier found at index = 2817
Multi-Feature outlier found at index = 2818
Multi-Feature outlier found at index = 2825
Multi-Feature outlier found at index = 2827
Multi-Feature outlier found at index = 2833
Multi-Feature outlier found at index = 2837
Multi-Feature outlier found at index = 2838
Multi-Feature outlier found at index = 2845
Multi-Feature outlier found at index = 2847
Multi-Feature outlier found at index = 2848
Multi-Feature outlier found at index = 2860
Multi-Feature outlier found at index = 2863
Multi-Feature outlier found at index = 2873
Multi-Feature outlier found at index = 2876
Multi-Feature outlier found at index = 2894
Multi-Feature outlier found at index = 2896
Multi-Feature outlier found at index = 2897
Multi-Feature outlier found at index = 2899
Multi-Feature outlier found at index = 2901
Multi-Feature outlier found at index = 2915
Multi-Feature outlier found at index = 2919
Multi-Feature outlier found at index = 2943
Multi-Feature outlier found at index = 2945
Multi-Feature outlier found at index = 2950
Multi-Feature outlier found at index = 2954
Multi-Feature outlier found at index = 2956
Multi-Feature outlier found at index = 2957
Multi-Feature outlier found at index = 2961
Multi-Feature outlier found at index = 2977
Multi-Feature outlier found at index = 2981
Multi-Feature outlier found at index = 3000
Multi-Feature outlier found at index = 3012
Multi-Feature outlier found at index = 3025
Multi-Feature outlier found at index = 3026
Multi-Feature outlier found at index = 3039
Multi-Feature outlier found at index = 3054
Multi-Feature outlier found at index = 3060
Multi-Feature outlier found at index = 3062
Multi-Feature outlier found at index = 3066
Multi-Feature outlier found at index = 3067
Multi-Feature outlier found at index = 3072
Multi-Feature outlier found at index = 3080
Multi-Feature outlier found at index = 3089
Multi-Feature outlier found at index = 3093
Multi-Feature outlier found at index = 3097
Multi-Feature outlier found at index = 3103
Multi-Feature outlier found at index = 3110
Multi-Feature outlier found at index = 3115
Multi-Feature outlier found at index = 3127
Multi-Feature outlier found at index = 3128
Multi-Feature outlier found at index = 3131
Multi-Feature outlier found at index = 3133
Multi-Feature outlier found at index = 3134
Multi-Feature outlier found at index = 3140
Multi-Feature outlier found at index = 3145
Multi-Feature outlier found at index = 3155
Multi-Feature outlier found at index = 3157
Multi-Feature outlier found at index = 3170
Multi-Feature outlier found at index = 3178
Multi-Feature outlier found at index = 3188
Multi-Feature outlier found at index = 3189
Multi-Feature outlier found at index = 3190
Multi-Feature outlier found at index = 3201
Multi-Feature outlier found at index = 3203
Multi-Feature outlier found at index = 3204
Multi-Feature outlier found at index = 3209
Multi-Feature outlier found at index = 3214
Multi-Feature outlier found at index = 3216
Multi-Feature outlier found at index = 3218
Multi-Feature outlier found at index = 3221
Multi-Feature outlier found at index = 3228
Multi-Feature outlier found at index = 154
Multi-Feature outlier found at index = 210
Multi-Feature outlier found at index = 267
Multi-Feature outlier found at index = 340
Multi-Feature outlier found at index = 351
Multi-Feature outlier found at index = 359
Multi-Feature outlier found at index = 416
Multi-Feature outlier found at index = 454
Multi-Feature outlier found at index = 555
Multi-Feature outlier found at index = 696
Multi-Feature outlier found at index = 731
Multi-Feature outlier found at index = 756
Multi-Feature outlier found at index = 765
Multi-Feature outlier found at index = 770
Multi-Feature outlier found at index = 876
Multi-Feature outlier found at index = 905
Multi-Feature outlier found at index = 935
Multi-Feature outlier found at index = 961
Multi-Feature outlier found at index = 999
Multi-Feature outlier found at index = 1086
Multi-Feature outlier found at index = 1098
Multi-Feature outlier found at index = 1113
Multi-Feature outlier found at index = 1191
Multi-Feature outlier found at index = 1252
Multi-Feature outlier found at index = 1260
Multi-Feature outlier found at index = 1323
Multi-Feature outlier found at index = 1348
Multi-Feature outlier found at index = 1449
Multi-Feature outlier found at index = 1503
Multi-Feature outlier found at index = 1581
Multi-Feature outlier found at index = 1675
Multi-Feature outlier found at index = 1683
Multi-Feature outlier found at index = 1722
Multi-Feature outlier found at index = 1733
Multi-Feature outlier found at index = 1751
Multi-Feature outlier found at index = 1803
Multi-Feature outlier found at index = 1863
Multi-Feature outlier found at index = 2114
Multi-Feature outlier found at index = 2310
Multi-Feature outlier found at index = 2330
Multi-Feature outlier found at index = 2422
Multi-Feature outlier found at index = 2424
Multi-Feature outlier found at index = 2432
Multi-Feature outlier found at index = 2486
Multi-Feature outlier found at index = 2525
Multi-Feature outlier found at index = 2552
Multi-Feature outlier found at index = 2563
Multi-Feature outlier found at index = 2656
Multi-Feature outlier found at index = 2734
Multi-Feature outlier found at index = 2742
Multi-Feature outlier found at index = 2910
Multi-Feature outlier found at index = 2918
Multi-Feature outlier found at index = 2951
Multi-Feature outlier found at index = 3045
Multi-Feature outlier found at index = 3046
Multi-Feature outlier found at index = 3049
Multi-Feature outlier found at index = 3159
Multi-Feature outlier found at index = 3
Multi-Feature outlier found at index = 6
Multi-Feature outlier found at index = 29
Multi-Feature outlier found at index = 42
Multi-Feature outlier found at index = 69
Multi-Feature outlier found at index = 97
Multi-Feature outlier found at index = 106
Multi-Feature outlier found at index = 172
Multi-Feature outlier found at index = 186
Multi-Feature outlier found at index = 196
Multi-Feature outlier found at index = 203
Multi-Feature outlier found at index = 249
Multi-Feature outlier found at index = 265
Multi-Feature outlier found at index = 270
Multi-Feature outlier found at index = 302
Multi-Feature outlier found at index = 309
Multi-Feature outlier found at index = 374
Multi-Feature outlier found at index = 379
Multi-Feature outlier found at index = 389
Multi-Feature outlier found at index = 397
Multi-Feature outlier found at index = 405
Multi-Feature outlier found at index = 418
Multi-Feature outlier found at index = 419
Multi-Feature outlier found at index = 437
Multi-Feature outlier found at index = 448
Multi-Feature outlier found at index = 489
Multi-Feature outlier found at index = 502
Multi-Feature outlier found at index = 528
Multi-Feature outlier found at index = 534
Multi-Feature outlier found at index = 566
Multi-Feature outlier found at index = 571
Multi-Feature outlier found at index = 583
Multi-Feature outlier found at index = 599
Multi-Feature outlier found at index = 614
Multi-Feature outlier found at index = 627
Multi-Feature outlier found at index = 656
Multi-Feature outlier found at index = 661
Multi-Feature outlier found at index = 697
Multi-Feature outlier found at index = 746
Multi-Feature outlier found at index = 787
Multi-Feature outlier found at index = 835
Multi-Feature outlier found at index = 845
Multi-Feature outlier found at index = 857
Multi-Feature outlier found at index = 880
Multi-Feature outlier found at index = 891
Multi-Feature outlier found at index = 898
Multi-Feature outlier found at index = 906
Multi-Feature outlier found at index = 942
Multi-Feature outlier found at index = 955
Multi-Feature outlier found at index = 1005
Multi-Feature outlier found at index = 1012
Multi-Feature outlier found at index = 1017
Multi-Feature outlier found at index = 1020
Multi-Feature outlier found at index = 1022
Multi-Feature outlier found at index = 1036
Multi-Feature outlier found at index = 1042
Multi-Feature outlier found at index = 1046
Multi-Feature outlier found at index = 1065
Multi-Feature outlier found at index = 1069
Multi-Feature outlier found at index = 1103
Multi-Feature outlier found at index = 1114
Multi-Feature outlier found at index = 1134
Multi-Feature outlier found at index = 1148
Multi-Feature outlier found at index = 1190
Multi-Feature outlier found at index = 1221
Multi-Feature outlier found at index = 1237
Multi-Feature outlier found at index = 1256
Multi-Feature outlier found at index = 1269
Multi-Feature outlier found at index = 1296
Multi-Feature outlier found at index = 1341
Multi-Feature outlier found at index = 1346
Multi-Feature outlier found at index = 1360
Multi-Feature outlier found at index = 1400
Multi-Feature outlier found at index = 1481
Multi-Feature outlier found at index = 1484
Multi-Feature outlier found at index = 1518
Multi-Feature outlier found at index = 1543
Multi-Feature outlier found at index = 1567
Multi-Feature outlier found at index = 1575
Multi-Feature outlier found at index = 1622
Multi-Feature outlier found at index = 1630
Multi-Feature outlier found at index = 1640
Multi-Feature outlier found at index = 1645
Multi-Feature outlier found at index = 1674
Multi-Feature outlier found at index = 1685
Multi-Feature outlier found at index = 1701
Multi-Feature outlier found at index = 1709
Multi-Feature outlier found at index = 1711
Multi-Feature outlier found at index = 1726
Multi-Feature outlier found at index = 1782
Multi-Feature outlier found at index = 1822
Multi-Feature outlier found at index = 1835
Multi-Feature outlier found at index = 1839
Multi-Feature outlier found at index = 1854
Multi-Feature outlier found at index = 1859
Multi-Feature outlier found at index = 1860
Multi-Feature outlier found at index = 1868
Multi-Feature outlier found at index = 1881
Multi-Feature outlier found at index = 1890
Multi-Feature outlier found at index = 1905
Multi-Feature outlier found at index = 1910
Multi-Feature outlier found at index = 1914
Multi-Feature outlier found at index = 1919
Multi-Feature outlier found at index = 1928
Multi-Feature outlier found at index = 1929
Multi-Feature outlier found at index = 1942
Multi-Feature outlier found at index = 1943
Multi-Feature outlier found at index = 1949
Multi-Feature outlier found at index = 1951
Multi-Feature outlier found at index = 1980
Multi-Feature outlier found at index = 1999
Multi-Feature outlier found at index = 2006
Multi-Feature outlier found at index = 2010
Multi-Feature outlier found at index = 2012
Multi-Feature outlier found at index = 2019
Multi-Feature outlier found at index = 2042
Multi-Feature outlier found at index = 2049
Multi-Feature outlier found at index = 2052
Multi-Feature outlier found at index = 2054
Multi-Feature outlier found at index = 2089
Multi-Feature outlier found at index = 2115
Multi-Feature outlier found at index = 2126
Multi-Feature outlier found at index = 2189
Multi-Feature outlier found at index = 2198
Multi-Feature outlier found at index = 2238
Multi-Feature outlier found at index = 2252
Multi-Feature outlier found at index = 2262
Multi-Feature outlier found at index = 2263
Multi-Feature outlier found at index = 2265
Multi-Feature outlier found at index = 2272
Multi-Feature outlier found at index = 2277
Multi-Feature outlier found at index = 2278
Multi-Feature outlier found at index = 2284
Multi-Feature outlier found at index = 2312
Multi-Feature outlier found at index = 2332
Multi-Feature outlier found at index = 2360
Multi-Feature outlier found at index = 2366
Multi-Feature outlier found at index = 2372
Multi-Feature outlier found at index = 2374
Multi-Feature outlier found at index = 2376
Multi-Feature outlier found at index = 2386
Multi-Feature outlier found at index = 2401
Multi-Feature outlier found at index = 2404
Multi-Feature outlier found at index = 2413
Multi-Feature outlier found at index = 2488
Multi-Feature outlier found at index = 2505
Multi-Feature outlier found at index = 2506
Multi-Feature outlier found at index = 2553
Multi-Feature outlier found at index = 2557
Multi-Feature outlier found at index = 2591
Multi-Feature outlier found at index = 2607
Multi-Feature outlier found at index = 2623
Multi-Feature outlier found at index = 2642
Multi-Feature outlier found at index = 2689
Multi-Feature outlier found at index = 2711
Multi-Feature outlier found at index = 2714
Multi-Feature outlier found at index = 2724
Multi-Feature outlier found at index = 2765
Multi-Feature outlier found at index = 2812
Multi-Feature outlier found at index = 2820
Multi-Feature outlier found at index = 2822
Multi-Feature outlier found at index = 2844
Multi-Feature outlier found at index = 2855
Multi-Feature outlier found at index = 2869
Multi-Feature outlier found at index = 2870
Multi-Feature outlier found at index = 2884
Multi-Feature outlier found at index = 2888
Multi-Feature outlier found at index = 2904
Multi-Feature outlier found at index = 2911
Multi-Feature outlier found at index = 2923
Multi-Feature outlier found at index = 2925
Multi-Feature outlier found at index = 2931
Multi-Feature outlier found at index = 2932
Multi-Feature outlier found at index = 2965
Multi-Feature outlier found at index = 2986
Multi-Feature outlier found at index = 2996
Multi-Feature outlier found at index = 3009
Multi-Feature outlier found at index = 3022
Multi-Feature outlier found at index = 3041
Multi-Feature outlier found at index = 3044
Multi-Feature outlier found at index = 3047
Multi-Feature outlier found at index = 3064
Multi-Feature outlier found at index = 3073
Multi-Feature outlier found at index = 3088
Multi-Feature outlier found at index = 3102
Multi-Feature outlier found at index = 3120
Multi-Feature outlier found at index = 3136
Multi-Feature outlier found at index = 3139
Multi-Feature outlier found at index = 3180
Multi-Feature outlier found at index = 3191
Multi-Feature outlier found at index = 3219
Multi-Feature outlier found at index = 63
Multi-Feature outlier found at index = 150
Multi-Feature outlier found at index = 289
Multi-Feature outlier found at index = 317
Multi-Feature outlier found at index = 458
Multi-Feature outlier found at index = 482
Multi-Feature outlier found at index = 509
Multi-Feature outlier found at index = 521
Multi-Feature outlier found at index = 613
Multi-Feature outlier found at index = 650
Multi-Feature outlier found at index = 703
Multi-Feature outlier found at index = 784
Multi-Feature outlier found at index = 836
Multi-Feature outlier found at index = 855
Multi-Feature outlier found at index = 866
Multi-Feature outlier found at index = 904
Multi-Feature outlier found at index = 911
Multi-Feature outlier found at index = 958
Multi-Feature outlier found at index = 1089
Multi-Feature outlier found at index = 1104
Multi-Feature outlier found at index = 1157
Multi-Feature outlier found at index = 1261
Multi-Feature outlier found at index = 1262
Multi-Feature outlier found at index = 1310
Multi-Feature outlier found at index = 1316
Multi-Feature outlier found at index = 1383
Multi-Feature outlier found at index = 1391
Multi-Feature outlier found at index = 1406
Multi-Feature outlier found at index = 1510
Multi-Feature outlier found at index = 1541
Multi-Feature outlier found at index = 1592
Multi-Feature outlier found at index = 1691
Multi-Feature outlier found at index = 1736
Multi-Feature outlier found at index = 1899
Multi-Feature outlier found at index = 1912
Multi-Feature outlier found at index = 1923
Multi-Feature outlier found at index = 1924
Multi-Feature outlier found at index = 1936
Multi-Feature outlier found at index = 1940
Multi-Feature outlier found at index = 1976
Multi-Feature outlier found at index = 2011
Multi-Feature outlier found at index = 2047
Multi-Feature outlier found at index = 2079
Multi-Feature outlier found at index = 2160
Multi-Feature outlier found at index = 2164
Multi-Feature outlier found at index = 2280
Multi-Feature outlier found at index = 2286
Multi-Feature outlier found at index = 2307
Multi-Feature outlier found at index = 2318
Multi-Feature outlier found at index = 2319
Multi-Feature outlier found at index = 2334
Multi-Feature outlier found at index = 2448
Multi-Feature outlier found at index = 2603
Multi-Feature outlier found at index = 2669
Multi-Feature outlier found at index = 2676
Multi-Feature outlier found at index = 2729
Multi-Feature outlier found at index = 2749
Multi-Feature outlier found at index = 2787
Multi-Feature outlier found at index = 2795
Multi-Feature outlier found at index = 3008
Multi-Feature outlier found at index = 3052
Multi-Feature outlier found at index = 3061
Multi-Feature outlier found at index = 3063
Multi-Feature outlier found at index = 3070
Multi-Feature outlier found at index = 3092
Multi-Feature outlier found at index = 3144
Multi-Feature outlier found at index = 3167
Multi-Feature outlier found at index = 1
Multi-Feature outlier found at index = 18
Multi-Feature outlier found at index = 48
Multi-Feature outlier found at index = 80
Multi-Feature outlier found at index = 149
Multi-Feature outlier found at index = 152
Multi-Feature outlier found at index = 187
Multi-Feature outlier found at index = 209
Multi-Feature outlier found at index = 229
Multi-Feature outlier found at index = 376
Multi-Feature outlier found at index = 468
Multi-Feature outlier found at index = 488
Multi-Feature outlier found at index = 523
Multi-Feature outlier found at index = 526
Multi-Feature outlier found at index = 604
Multi-Feature outlier found at index = 722
Multi-Feature outlier found at index = 743
Multi-Feature outlier found at index = 817
Multi-Feature outlier found at index = 818
Multi-Feature outlier found at index = 1068
Multi-Feature outlier found at index = 1153
Multi-Feature outlier found at index = 1170
Multi-Feature outlier found at index = 1173
Multi-Feature outlier found at index = 1182
Multi-Feature outlier found at index = 1408
Multi-Feature outlier found at index = 1415
Multi-Feature outlier found at index = 1495
Multi-Feature outlier found at index = 1544
Multi-Feature outlier found at index = 1608
Multi-Feature outlier found at index = 1668
Multi-Feature outlier found at index = 1753
Multi-Feature outlier found at index = 1776
Multi-Feature outlier found at index = 1869
Multi-Feature outlier found at index = 1948
Multi-Feature outlier found at index = 1986
Multi-Feature outlier found at index = 2032
Multi-Feature outlier found at index = 2133
Multi-Feature outlier found at index = 2183
Multi-Feature outlier found at index = 2264
Multi-Feature outlier found at index = 2281
Multi-Feature outlier found at index = 2337
Multi-Feature outlier found at index = 2456
Multi-Feature outlier found at index = 2516
Multi-Feature outlier found at index = 2536
Multi-Feature outlier found at index = 2675
Multi-Feature outlier found at index = 2803
Multi-Feature outlier found at index = 2809
Multi-Feature outlier found at index = 2852
Multi-Feature outlier found at index = 3023
Multi-Feature outlier found at index = 3031
Multi-Feature outlier found at index = 3075
Multi-Feature outlier found at index = 3161
Multi-Feature outlier found at index = 3199
Multi-Feature outlier found at index = 355
Multi-Feature outlier found at index = 1350
Multi-Feature outlier found at index = 1490
Multi-Feature outlier found at index = 1970
Multi-Feature outlier found at index = 2308
Multi-Feature outlier found at index = 2777
Multi-Feature outlier found at index = 2843
Multi-Feature outlier found at index = 11
Multi-Feature outlier found at index = 50
Multi-Feature outlier found at index = 53
Multi-Feature outlier found at index = 70
Multi-Feature outlier found at index = 117
Multi-Feature outlier found at index = 135
Multi-Feature outlier found at index = 240
Multi-Feature outlier found at index = 476
Multi-Feature outlier found at index = 535
Multi-Feature outlier found at index = 670
Multi-Feature outlier found at index = 1025
Multi-Feature outlier found at index = 1150
Multi-Feature outlier found at index = 1176
Multi-Feature outlier found at index = 1213
Multi-Feature outlier found at index = 1278
Multi-Feature outlier found at index = 1321
Multi-Feature outlier found at index = 1366
Multi-Feature outlier found at index = 1390
Multi-Feature outlier found at index = 1456
Multi-Feature outlier found at index = 1523
Multi-Feature outlier found at index = 1555
Multi-Feature outlier found at index = 1585
Multi-Feature outlier found at index = 1662
Multi-Feature outlier found at index = 1680
Multi-Feature outlier found at index = 1829
Multi-Feature outlier found at index = 1884
Multi-Feature outlier found at index = 1887
Multi-Feature outlier found at index = 1892
Multi-Feature outlier found at index = 1985
Multi-Feature outlier found at index = 1992
Multi-Feature outlier found at index = 2021
Multi-Feature outlier found at index = 2150
Multi-Feature outlier found at index = 2196
Multi-Feature outlier found at index = 2204
Multi-Feature outlier found at index = 2241
Multi-Feature outlier found at index = 2542
Multi-Feature outlier found at index = 2766
Multi-Feature outlier found at index = 2819
Multi-Feature outlier found at index = 2834
Multi-Feature outlier found at index = 2942
Multi-Feature outlier found at index = 3108
Multi-Feature outlier found at index = 3172
Multi-Feature outlier found at index = 8
Multi-Feature outlier found at index = 13
Multi-Feature outlier found at index = 52
Multi-Feature outlier found at index = 54
Multi-Feature outlier found at index = 75
Multi-Feature outlier found at index = 121
Multi-Feature outlier found at index = 124
Multi-Feature outlier found at index = 139
Multi-Feature outlier found at index = 174
Multi-Feature outlier found at index = 225
Multi-Feature outlier found at index = 254
Multi-Feature outlier found at index = 279
Multi-Feature outlier found at index = 287
Multi-Feature outlier found at index = 294
Multi-Feature outlier found at index = 299
Multi-Feature outlier found at index = 310
Multi-Feature outlier found at index = 334
Multi-Feature outlier found at index = 344
Multi-Feature outlier found at index = 349
Multi-Feature outlier found at index = 399
Multi-Feature outlier found at index = 424
Multi-Feature outlier found at index = 513
Multi-Feature outlier found at index = 546
Multi-Feature outlier found at index = 547
Multi-Feature outlier found at index = 702
Multi-Feature outlier found at index = 776
Multi-Feature outlier found at index = 809
Multi-Feature outlier found at index = 812
Multi-Feature outlier found at index = 824
Multi-Feature outlier found at index = 873
Multi-Feature outlier found at index = 879
Multi-Feature outlier found at index = 889
Multi-Feature outlier found at index = 903
Multi-Feature outlier found at index = 953
Multi-Feature outlier found at index = 1063
Multi-Feature outlier found at index = 1077
Multi-Feature outlier found at index = 1178
Multi-Feature outlier found at index = 1246
Multi-Feature outlier found at index = 1249
Multi-Feature outlier found at index = 1324
Multi-Feature outlier found at index = 1478
Multi-Feature outlier found at index = 1564
Multi-Feature outlier found at index = 1571
Multi-Feature outlier found at index = 1692
Multi-Feature outlier found at index = 1698
Multi-Feature outlier found at index = 1719
Multi-Feature outlier found at index = 1723
Multi-Feature outlier found at index = 1946
Multi-Feature outlier found at index = 2018
Multi-Feature outlier found at index = 2048
Multi-Feature outlier found at index = 2067
Multi-Feature outlier found at index = 2093
Multi-Feature outlier found at index = 2111
Multi-Feature outlier found at index = 2117
Multi-Feature outlier found at index = 2135
Multi-Feature outlier found at index = 2156
Multi-Feature outlier found at index = 2173
Multi-Feature outlier found at index = 2356
Multi-Feature outlier found at index = 2499
Multi-Feature outlier found at index = 2673
Multi-Feature outlier found at index = 2677
Multi-Feature outlier found at index = 2692
Multi-Feature outlier found at index = 2738
Multi-Feature outlier found at index = 2767
Multi-Feature outlier found at index = 2789
Multi-Feature outlier found at index = 2851
Multi-Feature outlier found at index = 2858
Multi-Feature outlier found at index = 2966
Multi-Feature outlier found at index = 3029
Multi-Feature outlier found at index = 3065
Multi-Feature outlier found at index = 3195
Multi-Feature outlier found at index = 37
Multi-Feature outlier found at index = 114
Multi-Feature outlier found at index = 136
Multi-Feature outlier found at index = 175
Multi-Feature outlier found at index = 211
Multi-Feature outlier found at index = 214
Multi-Feature outlier found at index = 252
Multi-Feature outlier found at index = 277
Multi-Feature outlier found at index = 284
Multi-Feature outlier found at index = 365
Multi-Feature outlier found at index = 410
Multi-Feature outlier found at index = 415
Multi-Feature outlier found at index = 432
Multi-Feature outlier found at index = 510
Multi-Feature outlier found at index = 512
Multi-Feature outlier found at index = 589
Multi-Feature outlier found at index = 600
Multi-Feature outlier found at index = 629
Multi-Feature outlier found at index = 644
Multi-Feature outlier found at index = 686
Multi-Feature outlier found at index = 730
Multi-Feature outlier found at index = 918
Multi-Feature outlier found at index = 926
Multi-Feature outlier found at index = 938
Multi-Feature outlier found at index = 1043
Multi-Feature outlier found at index = 1106
Multi-Feature outlier found at index = 1137
Multi-Feature outlier found at index = 1175
Multi-Feature outlier found at index = 1227
Multi-Feature outlier found at index = 1229
Multi-Feature outlier found at index = 1314
Multi-Feature outlier found at index = 1377
Multi-Feature outlier found at index = 1519
Multi-Feature outlier found at index = 1688
Multi-Feature outlier found at index = 1730
Multi-Feature outlier found at index = 1734
Multi-Feature outlier found at index = 1908
Multi-Feature outlier found at index = 1968
Multi-Feature outlier found at index = 2002
Multi-Feature outlier found at index = 2058
Multi-Feature outlier found at index = 2066
Multi-Feature outlier found at index = 2154
Multi-Feature outlier found at index = 2168
Multi-Feature outlier found at index = 2180
Multi-Feature outlier found at index = 2206
Multi-Feature outlier found at index = 2341
Multi-Feature outlier found at index = 2355
Multi-Feature outlier found at index = 2382
Multi-Feature outlier found at index = 2409
Multi-Feature outlier found at index = 2496
Multi-Feature outlier found at index = 2500
Multi-Feature outlier found at index = 2555
Multi-Feature outlier found at index = 2581
Multi-Feature outlier found at index = 2599
Multi-Feature outlier found at index = 2601
Multi-Feature outlier found at index = 2604
Multi-Feature outlier found at index = 2702
Multi-Feature outlier found at index = 2715
Multi-Feature outlier found at index = 2723
Multi-Feature outlier found at index = 2739
Multi-Feature outlier found at index = 2756
Multi-Feature outlier found at index = 2821
Multi-Feature outlier found at index = 2902
Multi-Feature outlier found at index = 2940
Multi-Feature outlier found at index = 3033
Multi-Feature outlier found at index = 3042
Multi-Feature outlier found at index = 3184
Multi-Feature outlier found at index = 131
Multi-Feature outlier found at index = 197
Multi-Feature outlier found at index = 244
Multi-Feature outlier found at index = 470
Multi-Feature outlier found at index = 477
Multi-Feature outlier found at index = 672
Multi-Feature outlier found at index = 794
Multi-Feature outlier found at index = 826
Multi-Feature outlier found at index = 852
Multi-Feature outlier found at index = 948
Multi-Feature outlier found at index = 950
Multi-Feature outlier found at index = 1030
Multi-Feature outlier found at index = 1035
Multi-Feature outlier found at index = 1056
Multi-Feature outlier found at index = 1067
Multi-Feature outlier found at index = 1096
Multi-Feature outlier found at index = 1118
Multi-Feature outlier found at index = 1194
Multi-Feature outlier found at index = 1199
Multi-Feature outlier found at index = 1208
Multi-Feature outlier found at index = 1268
Multi-Feature outlier found at index = 1307
Multi-Feature outlier found at index = 1313
Multi-Feature outlier found at index = 1315
Multi-Feature outlier found at index = 1351
Multi-Feature outlier found at index = 1370
Multi-Feature outlier found at index = 1374
Multi-Feature outlier found at index = 1434
Multi-Feature outlier found at index = 1439
Multi-Feature outlier found at index = 1457
Multi-Feature outlier found at index = 1473
Multi-Feature outlier found at index = 1546
Multi-Feature outlier found at index = 1551
Multi-Feature outlier found at index = 1563
Multi-Feature outlier found at index = 1589
Multi-Feature outlier found at index = 1617
Multi-Feature outlier found at index = 1621
Multi-Feature outlier found at index = 1867
Multi-Feature outlier found at index = 1932
Multi-Feature outlier found at index = 1996
Multi-Feature outlier found at index = 2023
Multi-Feature outlier found at index = 2046
Multi-Feature outlier found at index = 2085
Multi-Feature outlier found at index = 2121
Multi-Feature outlier found at index = 2169
Multi-Feature outlier found at index = 2235
Multi-Feature outlier found at index = 2256
Multi-Feature outlier found at index = 2406
Multi-Feature outlier found at index = 2470
Multi-Feature outlier found at index = 2671
Multi-Feature outlier found at index = 2697
Multi-Feature outlier found at index = 2784
Multi-Feature outlier found at index = 2786
Multi-Feature outlier found at index = 2828
Multi-Feature outlier found at index = 2857
Multi-Feature outlier found at index = 2973
Multi-Feature outlier found at index = 3068
Multi-Feature outlier found at index = 3105
Multi-Feature outlier found at index = 3152
Multi-Feature outlier found at index = 20
Multi-Feature outlier found at index = 65
Multi-Feature outlier found at index = 67
Multi-Feature outlier found at index = 73
Multi-Feature outlier found at index = 179
Multi-Feature outlier found at index = 216
Multi-Feature outlier found at index = 231
Multi-Feature outlier found at index = 556
Multi-Feature outlier found at index = 577
Multi-Feature outlier found at index = 586
Multi-Feature outlier found at index = 590
Multi-Feature outlier found at index = 666
Multi-Feature outlier found at index = 682
Multi-Feature outlier found at index = 797
Multi-Feature outlier found at index = 804
Multi-Feature outlier found at index = 910
Multi-Feature outlier found at index = 919
Multi-Feature outlier found at index = 994
Multi-Feature outlier found at index = 1074
Multi-Feature outlier found at index = 1126
Multi-Feature outlier found at index = 1189
Multi-Feature outlier found at index = 1207
Multi-Feature outlier found at index = 1318
Multi-Feature outlier found at index = 1458
Multi-Feature outlier found at index = 1524
Multi-Feature outlier found at index = 1573
Multi-Feature outlier found at index = 1574
Multi-Feature outlier found at index = 1681
Multi-Feature outlier found at index = 1772
Multi-Feature outlier found at index = 1841
Multi-Feature outlier found at index = 1846
Multi-Feature outlier found at index = 1993
Multi-Feature outlier found at index = 2080
Multi-Feature outlier found at index = 2096
Multi-Feature outlier found at index = 2122
Multi-Feature outlier found at index = 2297
Multi-Feature outlier found at index = 2320
Multi-Feature outlier found at index = 2381
Multi-Feature outlier found at index = 2534
Multi-Feature outlier found at index = 2612
Multi-Feature outlier found at index = 2627
Multi-Feature outlier found at index = 2710
Multi-Feature outlier found at index = 2914
Multi-Feature outlier found at index = 2944
Multi-Feature outlier found at index = 2978
Multi-Feature outlier found at index = 3034
Multi-Feature outlier found at index = 3146
Multi-Feature outlier found at index = 3182
Multi-Feature outlier found at index = 45
Multi-Feature outlier found at index = 86
Multi-Feature outlier found at index = 90
Multi-Feature outlier found at index = 108
Multi-Feature outlier found at index = 164
Multi-Feature outlier found at index = 166
Multi-Feature outlier found at index = 188
Multi-Feature outlier found at index = 495
Multi-Feature outlier found at index = 525
Multi-Feature outlier found at index = 591
Multi-Feature outlier found at index = 608
Multi-Feature outlier found at index = 615
Multi-Feature outlier found at index = 634
Multi-Feature outlier found at index = 673
Multi-Feature outlier found at index = 687
Multi-Feature outlier found at index = 704
Multi-Feature outlier found at index = 858
Multi-Feature outlier found at index = 1120
Multi-Feature outlier found at index = 1181
Multi-Feature outlier found at index = 1223
Multi-Feature outlier found at index = 1251
Multi-Feature outlier found at index = 1305
Multi-Feature outlier found at index = 1322
Multi-Feature outlier found at index = 1349
Multi-Feature outlier found at index = 1399
Multi-Feature outlier found at index = 1475
Multi-Feature outlier found at index = 1654
Multi-Feature outlier found at index = 1682
Multi-Feature outlier found at index = 1765
Multi-Feature outlier found at index = 1825
Multi-Feature outlier found at index = 1927
Multi-Feature outlier found at index = 1964
Multi-Feature outlier found at index = 2070
Multi-Feature outlier found at index = 2073
Multi-Feature outlier found at index = 2088
Multi-Feature outlier found at index = 2199
Multi-Feature outlier found at index = 2396
Multi-Feature outlier found at index = 2420
Multi-Feature outlier found at index = 2475
Multi-Feature outlier found at index = 2582
Multi-Feature outlier found at index = 2593
Multi-Feature outlier found at index = 2647
Multi-Feature outlier found at index = 2805
Multi-Feature outlier found at index = 2907
Multi-Feature outlier found at index = 2922
Multi-Feature outlier found at index = 3036
Multi-Feature outlier found at index = 3113
Multi-Feature outlier found at index = 3176
Multi-Feature outlier found at index = 31
Multi-Feature outlier found at index = 407
Multi-Feature outlier found at index = 886
Multi-Feature outlier found at index = 1231
Multi-Feature outlier found at index = 1655
Multi-Feature outlier found at index = 1821
Multi-Feature outlier found at index = 2041
Multi-Feature outlier found at index = 2144
Multi-Feature outlier found at index = 2166
Multi-Feature outlier found at index = 2282
Multi-Feature outlier found at index = 2810
Multi-Feature outlier found at index = 2832
Multi-Feature outlier found at index = 3059
Multi-Feature outlier found at index = 141
Multi-Feature outlier found at index = 391
Multi-Feature outlier found at index = 514
Multi-Feature outlier found at index = 529
Multi-Feature outlier found at index = 549
Multi-Feature outlier found at index = 582
Multi-Feature outlier found at index = 619
Multi-Feature outlier found at index = 759
Multi-Feature outlier found at index = 801
Multi-Feature outlier found at index = 877
Multi-Feature outlier found at index = 929
Multi-Feature outlier found at index = 987
Multi-Feature outlier found at index = 1039
Multi-Feature outlier found at index = 1161
Multi-Feature outlier found at index = 1335
Multi-Feature outlier found at index = 1436
Multi-Feature outlier found at index = 1437
Multi-Feature outlier found at index = 1735
Multi-Feature outlier found at index = 1779
Multi-Feature outlier found at index = 2015
Multi-Feature outlier found at index = 2182
Multi-Feature outlier found at index = 2431
Multi-Feature outlier found at index = 2480
Multi-Feature outlier found at index = 2513
Multi-Feature outlier found at index = 2561
Multi-Feature outlier found at index = 2791
Multi-Feature outlier found at index = 2853
Multi-Feature outlier found at index = 2868
Multi-Feature outlier found at index = 2885
Multi-Feature outlier found at index = 2960
Multi-Feature outlier found at index = 3104
Multi-Feature outlier found at index = 3117
Multi-Feature outlier found at index = 810
Multi-Feature outlier found at index = 2975
Multi-Feature outlier found at index = 260
Multi-Feature outlier found at index = 368
Multi-Feature outlier found at index = 588
Multi-Feature outlier found at index = 981
Multi-Feature outlier found at index = 1770
Multi-Feature outlier found at index = 2434
Multi-Feature outlier found at index = 2754
Multi-Feature outlier found at index = 64
Multi-Feature outlier found at index = 96
Multi-Feature outlier found at index = 516
Multi-Feature outlier found at index = 572
Multi-Feature outlier found at index = 1160
Multi-Feature outlier found at index = 1577
Multi-Feature outlier found at index = 1651
Multi-Feature outlier found at index = 1727
Multi-Feature outlier found at index = 2512
Multi-Feature outlier found at index = 2570
Multi-Feature outlier found at index = 2771
Multi-Feature outlier found at index = 3082
Multi-Feature outlier found at index = 385
Multi-Feature outlier found at index = 640
Multi-Feature outlier found at index = 1487
Multi-Feature outlier found at index = 1705
Multi-Feature outlier found at index = 1755
Multi-Feature outlier found at index = 2502
Multi-Feature outlier found at index = 2541
Multi-Feature outlier found at index = 621
Multi-Feature outlier found at index = 769
Multi-Feature outlier found at index = 2524
Multi-Feature outlier found at index = 2688
Total multi-Feature outliers found = 1330 out of sample of 3229
In [19]:
#Remove outliers
print('Total outliers detected = '+str(len(outliersN)) + ' out of a total sample of ' + str(len(datac)))
good_data = datac.drop(datac.index[outliersN]).reset_index(drop = True)
print('Muli-class outliers removed, and noisey columns removed. Remaining rows = {}, remaining columns = {}'.format(*good_data.shape))
Total outliers detected = 1330 out of a total sample of 3229
Muli-class outliers removed, and noisey columns removed. Remaining rows = 1899, remaining columns = 33
In [20]:
#Reset Metric Columns after blacklisting
iabCat_hits_cols = [col for col in good_data.columns if 'hits_iabCat' in col]
summary_metric_cols = ['datesCount', 'domainsCount', 'hitsSum','pageViewsSum', 'uniqueViewsSum', 'clicks', 'clickDates']
label_cols = ['acct_id','hq_id','company','userDomain','city','region','countryCode','revenue_mil_usd','total_employees']
continuous_cols = iabCat_hits_cols + ['revenue_mil_usd','total_employees']

#Ensure only columns still remaining
current_cols = good_data.columns
iabCat_hits_cols = sorted(list(set(iabCat_hits_cols) & set(current_cols)))
summary_metric_cols = list(set(summary_metric_cols) & set(current_cols))
label_cols = list(set(label_cols) & set(current_cols))
continuous_cols = sorted(list(set(continuous_cols) & set(current_cols)))

Random Sample Analysis

To get a better understanding of the customers and how their data will transform through the analysis, it would be best to select a few sample data points and explore them in more detail.

In [21]:
#Generate random samples to investigate quickly and visually
portion=0.005
samples = getRandomSample(portion,good_data)
sample_size=len(samples)
print("Chosen samples of Company Intent customers dataset:")
display(samples)

#Visualise Sample data
samples_for_plot = samples.copy()
samples_for_plot = samples_for_plot[summary_metric_cols]
#Add median plot as well
columnsN = len(samples_for_plot.columns)
samples_for_plot.loc[sample_size + 1] = good_data.median()
#Create lables
labels=[]
for i in range(0,sample_size):
    name = 'Sample_'+str(i)
    labels.append(name)    

labels.append('Median') 
Sample indices selected -  9
(9, 33)
Chosen samples of Company Intent customers dataset:
acct_id hq_id company userDomain city region countryCode revenue_mil_usd total_employees naic3 ... hits_iabCat_IAB_9 hits_iabCat_IAB_10 hits_iabCat_IAB_12 hits_iabCat_IAB_13 hits_iabCat_IAB_15 hits_iabCat_IAB_17 hits_iabCat_IAB_18 hits_iabCat_IAB_19 hits_iabCat_IAB_21 hits_iabCat_IAB_22
0 1-1008UAE 1-YIDRR5 REDACTED REDACTED Anaheim California US 10.10130 90.0 484.0 ... 2.0 1.0 0.0 1.0 2.0 1.0 2.0 7.0 1.0 0.0
1 1-10LADRK 1-QJO-705 REDACTED REDACTED Tranbjerg Midtjylland DK 67.79649 35.0 813.0 ... 1.0 0.0 8.0 0.0 1.0 0.0 0.0 22.0 0.0 2.0
2 1-10Q1AWN 1-10Q1AWN REDACTED REDACTED Lynden Washington US 13.73930 49.0 453.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
3 1-10QGCTF 1-10QGCTF REDACTED REDACTED Saint Clair Michigan US 9.42010 73.0 484.0 ... 1.0 0.0 8.0 0.0 0.0 3.0 2.0 3.0 0.0 5.0
4 1-105SDKO 1-105SDKO REDACTED REDACTED Portland Oregon US 15.52020 70.0 336.0 ... 0.0 0.0 2.0 2.0 0.0 0.0 0.0 4.0 0.0 0.0
5 1-10FCA5X 1-10FCA5X REDACTED REDACTED Pharr Texas US 10.35140 62.0 442.0 ... 2.0 0.0 2.0 14.0 0.0 4.0 0.0 31.0 0.0 3.0
6 1-10JENYE 1-6HE4W9 REDACTED REDACTED Saint Louis Missouri US 7.31530 175.0 813.0 ... 2.0 3.0 41.0 0.0 1.0 0.0 5.0 5.0 0.0 0.0
7 1-1055N9T 1-1055N9T REDACTED REDACTED Mountain View California US 12.27210 78.0 541.0 ... 3.0 1.0 4.0 1.0 0.0 0.0 1.0 7.0 0.0 7.0
8 1-10CEWY4 1-T4KG99 REDACTED REDACTED Newbury England GB 15.24755 90.0 323.0 ... 0.0 0.0 7.0 1.0 0.0 0.0 9.0 1.0 0.0 0.0

9 rows × 33 columns

In [22]:
#Plot graph
samples_for_plot.plot(kind='bar',figsize=(20,20))
plt.xticks(range(sample_size+1),labels)
plt.figure()
plt.show()
<Figure size 432x288 with 0 Axes>
  • Can see that some accounts have significantly higher activity than others
In [23]:
#For a better comparison, we will look at the heat map of the data
summary_data = samples[summary_metric_cols]
percentiles_summary_data = 100*summary_data.rank(pct=True)
In [24]:
plt.figure(figsize=(8,8))
sns.heatmap(percentiles_summary_data, annot=True)
Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c84e94a0f0>
In [25]:
#Correlation heatmap!
plt.figure(figsize=(10,10))
sns.heatmap(summary_data.corr(),annot=True)
Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c84f2d7a58>
In [26]:
#Look at breakdown by IAB Cat
continuous_data = good_data[continuous_cols]
#Expensive operation so run the calculations first and graph seperately
correlations = continuous_data.corr()
In [27]:
plt.figure(figsize=(20, 20))
sns.heatmap(correlations,annot=True)
Out[27]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c84f450f28>
  • Potential data quality issue with revenue_mil_usd and total_employees. These should be correlated right?

Predictive Power of Features

Quick analysis to see if there are any clear inference a supervised learning could generate from the data

In [28]:
#Prepare target value (category 19 is what we specialise in selling so lates try this one)
target = 'hits_iabCat_IAB_19'
#Returns a series for all the target values (same as keying a dict in kdb!)
target_data = continuous_data[target]
#Ravel series data as single column target
target_data = target_data.ravel()

#Drop target so we can use a feature
continuous_data.drop([target], axis = 1, inplace = True)
print("Wholesale customers dataset created to predict " + target +  " has {} samples with {} features each.".format(*continuous_data.shape))
Wholesale customers dataset created to predict hits_iabCat_IAB_19 has 1899 samples with 17 features each.
C:\Users\cbath\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py:3940: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)
In [29]:
#Split the data into training and testing sets(0.25)
X_train, X_test, y_train, y_test = train_test_split(continuous_data, target_data, test_size=0.25, random_state=10)

#Create a decision tree regressor and fit it to the training set
regressor = tree.DecisionTreeRegressor(max_depth=25)
regressor = regressor.fit(X_train,y_train)
print('DecisionTreeRegressor fit to training data')

# TODO: Report the score of the prediction using the testing set
testPredictions = regressor.predict(X_test)
testPredictions = testPredictions
print('DecisionTreeRegressor test predictions ran')

#Score must be relevant to a regressor(r2 score used)
score = regressor.score(X_test, y_test)
print(score)
DecisionTreeRegressor fit to training data
DecisionTreeRegressor test predictions ran
-0.37974863208758736
  • Very poor predictions soley based on IAB Activity trying to predict hits_iabCat_IAB_19.

Visualize Feature Distributions

To get a better understanding of the dataset, we can construct a scatter matrix of each of the six product features present in the data. If you found that the feature you attempted to predict above is relevant for identifying a specific customer, then the scatter matrix below may not show any correlation between that feature and the others. Conversely, if you believe that feature is not relevant for identifying a specific customer, the scatter matrix might show a correlation between that feature and another feature in the data. Run the code block below to produce a scatter matrix.

In [30]:
# Produce a scatter matrix for each pair of features in the data
axes = scatter_matrix(continuous_data, alpha=0.75, figsize = (40,40), diagonal = 'kde')
corr = continuous_data.corr().values
for i, j in zip(*np.triu_indices_from(axes, k=1)):
    axes[i, j].annotate("%.3f" %corr[i,j], (0.8, 0.8), xycoords='axes fraction', ha='center', va='center')
  • All the continuious columns are skewed to the right. Going to need some data conversions here

Benchmark Model

I will create here a basic model that will be used to generate a benchmark for further, parameter and data optimised, models to compare against in order to gauge how much improvment each iteration of the model is generating.

Benchmark Dimensionality Reduction to Visualise

In [31]:
#Limit to only continuous features
bench_data = good_data[continuous_cols]
bench_samples = samples[continuous_cols]
In [32]:
#Apply PCA by fitting the good data with the same number of dimensions as features
components = np.unique(bench_data.keys())
nc = componentsN = len(components)
display(components)
pca = PCA(n_components=componentsN).fit(bench_data)
print('Total dataset components = ' + str(componentsN))

#Transform the good data using the PCA fit above
reduced_data = pca.transform(bench_data)

#Transform samples using the PCA fit above
pca_samples = pca.transform(bench_samples)

# Generate PCA results plot
pca_results = vs.pca_results(bench_data, pca)

#Transform samples using the PCA fit above
pca_samples = pca.transform(bench_samples)

#Create a DataFrame for the reduced data
column_names = []
for i in range(1,nc+1):
    name = 'Dimension '+str(i)
    column_names.append(name)
reduced_data = pd.DataFrame(reduced_data, columns = column_names)
array(['hits_iabCat_IAB_1', 'hits_iabCat_IAB_10', 'hits_iabCat_IAB_12',
       'hits_iabCat_IAB_13', 'hits_iabCat_IAB_15', 'hits_iabCat_IAB_17',
       'hits_iabCat_IAB_18', 'hits_iabCat_IAB_19', 'hits_iabCat_IAB_2',
       'hits_iabCat_IAB_21', 'hits_iabCat_IAB_22', 'hits_iabCat_IAB_3',
       'hits_iabCat_IAB_4', 'hits_iabCat_IAB_5', 'hits_iabCat_IAB_7',
       'hits_iabCat_IAB_9', 'revenue_mil_usd', 'total_employees'],
      dtype=object)
Total dataset components = 18
In [33]:
#https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html
#Apply your clustering algorithm of choice to the reduced data 
clusterer =  KMeans(n_clusters=3,random_state=1).fit(bench_data)
#Predict the cluster for each data point
preds = clusterer.predict(bench_data)
#Find the cluster centers (not applicable to DB scan)
centers = clusterer.cluster_centers_ 
#display(centers)

#Calculate the mean silhouette coefficient for the number of clusters chosen
score = silhouette_score(bench_data, preds)
print(score)
0.9790863624154715
In [34]:
# Create a biplot
vs.biplot(bench_data, reduced_data, pca)
Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c85a874cc0>

Creating Bench Mark Clusters

In [35]:
#Example https://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto-examples-cluster-plot-dbscan-py
clustersN=3
#Apply your clustering algorithm of choice to the reduced data 
clusterer = KMeans(n_clusters=clustersN,random_state=1).fit(reduced_data)
#Predict the cluster for each data point
preds = clusterer.predict(reduced_data)
#Find the cluster centers (not applicable to DB scan)
centers = clusterer.cluster_centers_ 
#display(centers)

#Calculate the mean silhouette coefficient for the number of clusters chosen
score = silhouette_score(reduced_data, preds)
print(score)
0.9790863624154716

Appears to be a very good scores here. But looking at the visualisation this is false positive as due to the distribution of data it is simply placing the vast majority of clusters into one cluster that is far away from the other 2 clusters

Cluster Visualization

In [36]:
# Display the results of the clustering from implementation
vs.cluster_results(reduced_data, preds, centers, pca_samples)

Feature Scaling

In [37]:
#Limit to only continuous features
good_data = good_data[continuous_cols]
samples = samples[continuous_cols]
#Scale the data using the natural logarithm
log_data = good_data.apply(lambda x: np.log(x+1))
log_samples = samples.apply(lambda x: np.log(x+1))
In [38]:
# Produce a scatter matrix for each pair of newly-transformed features
axes = scatter_matrix(log_data, alpha=0.75, figsize = (40,40), diagonal = 'kde')
corr = log_data.corr().values
for i, j in zip(*np.triu_indices_from(axes, k=1)):
    axes[i, j].annotate("%.3f" %corr[i,j], (0.8, 0.8), xycoords='axes fraction', ha='center', va='center')
In [39]:
#Not quite normarmally distrubuted for a lot of columns, lets try the boxco transformation
#Another option in that case is the BoxCox transformation. WE CAN'T AS WE HAVE 0 VALUES PRESENT
#boxcox_data = reduced_data.apply(lambda x: boxcox(x)[0])
#pd.scatter_matrix(boxcox_data, alpha = 0.3, figsize = (14,10), diagonal = 'kde');

Dimensionality Reduction using PCA

In [40]:
#Apply PCA by fitting the good data with the same number of dimensions as features
components = np.unique(log_data.keys())
componentsN = len(components)
display(components)
pca = PCA(n_components=componentsN).fit(log_data)
print('Total dataset components = ' + str(componentsN))

#Transform the good data using the PCA fit above
reduced_data = pca.transform(log_data)

#Transform log_samples using the PCA fit above
pca_samples = pca.transform(log_samples)

# Generate PCA results plot
pca_results = vs.pca_results(log_data, pca)
array(['hits_iabCat_IAB_1', 'hits_iabCat_IAB_10', 'hits_iabCat_IAB_12',
       'hits_iabCat_IAB_13', 'hits_iabCat_IAB_15', 'hits_iabCat_IAB_17',
       'hits_iabCat_IAB_18', 'hits_iabCat_IAB_19', 'hits_iabCat_IAB_2',
       'hits_iabCat_IAB_21', 'hits_iabCat_IAB_22', 'hits_iabCat_IAB_3',
       'hits_iabCat_IAB_4', 'hits_iabCat_IAB_5', 'hits_iabCat_IAB_7',
       'hits_iabCat_IAB_9', 'revenue_mil_usd', 'total_employees'],
      dtype=object)
Total dataset components = 18
In [41]:
print(pca_results['Explained Variance'].cumsum())
Dimension 1     0.2548
Dimension 2     0.4615
Dimension 3     0.5350
Dimension 4     0.6031
Dimension 5     0.6602
Dimension 6     0.7154
Dimension 7     0.7622
Dimension 8     0.8053
Dimension 9     0.8395
Dimension 10    0.8676
Dimension 11    0.8909
Dimension 12    0.9122
Dimension 13    0.9316
Dimension 14    0.9501
Dimension 15    0.9664
Dimension 16    0.9794
Dimension 17    0.9909
Dimension 18    0.9999
Name: Explained Variance, dtype: float64
In [42]:
# Display sample log-data after having a PCA transformation applied
display(pd.DataFrame(np.round(pca_samples, sample_size), columns = pca_results.index.values))
Dimension 1 Dimension 2 Dimension 3 Dimension 4 Dimension 5 Dimension 6 Dimension 7 Dimension 8 Dimension 9 Dimension 10 Dimension 11 Dimension 12 Dimension 13 Dimension 14 Dimension 15 Dimension 16 Dimension 17 Dimension 18
0 0.373305 -0.167590 -0.654924 0.252178 -1.129142 -0.824513 0.561782 0.318258 0.306335 0.548252 -0.076049 0.641933 0.833478 -0.352668 -0.124509 -0.331635 0.618897 -0.055672
1 1.304283 0.879927 -0.042095 1.953268 -0.309865 1.469665 -0.839837 -0.035602 0.537913 -0.173054 0.374615 0.082864 0.284018 -0.094298 -0.070482 -0.085789 -0.081138 -0.133973
2 -2.120556 -0.697950 0.234817 0.186270 -0.373380 0.360459 0.122430 0.047420 0.078046 0.030375 -0.057246 0.038257 -0.062918 -0.040858 -0.115641 0.040235 0.673554 -0.022855
3 1.327333 -0.275121 -0.374928 -0.164797 0.777257 0.541452 0.953695 -1.566035 -1.263244 0.747047 -0.169814 -0.294464 -0.308705 0.242600 -0.486330 -0.286782 -0.251512 -0.136711
4 0.326883 -0.139276 -0.042961 -0.366488 -0.878076 0.240980 0.117806 0.296860 -1.171493 -0.263073 0.617865 -0.754383 -0.200157 0.044210 -0.236431 -0.068766 -0.115102 -0.078526
5 2.414801 -0.043133 -0.078185 1.107894 -0.289075 -0.608097 0.483539 2.052102 -0.225982 0.701010 -0.700721 -0.727103 -0.639553 0.051099 -0.095169 -0.204071 -0.284754 -0.228468
6 1.823110 0.096358 -1.003466 -1.498565 0.572658 0.383197 -0.720374 -1.629161 0.102991 -0.954128 -0.474285 0.374446 0.859501 -0.088090 0.293430 -0.572480 -0.153949 -0.087966
7 1.210261 0.110829 -0.678787 1.047251 0.926762 0.254405 1.028610 -0.134819 0.245218 -0.400337 0.672509 1.042770 -0.360854 -0.542020 0.288949 -0.382263 -0.141532 -0.128719
8 0.107254 -0.135549 -0.118391 -1.264921 -0.015412 0.760699 -0.068676 -0.538159 -0.718182 -0.633014 -0.671142 1.073506 0.297788 0.382264 -0.847217 -0.196462 -0.324890 -0.233915
In [43]:
#Apply PCA by fitting the good data with only 11 dimensions (based on ~90% of variance explained by 11 components above)
nc = 10
pca = PCA(n_components=nc).fit(log_data)

#Transform the good data using the PCA fit above
reduced_data = pca.transform(log_data)

#Transform log_samples using the PCA fit above
pca_samples = pca.transform(log_samples)

#Create a DataFrame for the reduced data
column_names = []
for i in range(1,nc+1):
    name = 'Dimension '+str(i)
    column_names.append(name)
reduced_data = pd.DataFrame(reduced_data, columns = column_names)
In [44]:
# Display sample log-data after applying PCA transformation in two dimensions
#display(reduced_data)

Visualizing a Biplot

A biplot is a scatterplot where each data point is represented by its scores along the principal components. The axes are the principal components. In addition, the biplot shows the projection of the original features along the components. A biplot can help us interpret the reduced dimensions of the data, and discover relationships between the principal components and original features.

In [45]:
# Create a biplot
vs.biplot(log_data, reduced_data, pca)
Out[45]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c8552c1a20>

Creating Clusters k-means

Investigate if feature scaling and dimensionality reduction aids in the creation of better clusters

In [46]:
#Example https://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto-examples-cluster-plot-dbscan-py
def applyKmeans(k,data):
    #Apply clustering algorithm to the data 
    clusterer = KMeans(n_clusters=k,random_state=1).fit(data)
    #Predict the cluster for each data point
    preds = clusterer.predict(reduced_data)
    #Find the cluster centers (not applicable to DB scan)
    centers = clusterer.cluster_centers_
    #Calculate the mean silhouette coefficient for the number of clusters chosen
    score = silhouette_score(reduced_data, preds)
    print("K="+str(k) + ", Silhouette Score - "+ str(score))
In [47]:
#Investigate what is the optimal value for k
clustersN=[2,3,4,5,10,15,20,25,30,35,40,45,50,60,70,80,90,100]
for i in clustersN:
    applyKmeans(i,reduced_data)
K=2, Silhouette Score - 0.22262824559781863
K=3, Silhouette Score - 0.197938280607287
K=4, Silhouette Score - 0.14799597281540847
K=5, Silhouette Score - 0.14922108533016637
K=10, Silhouette Score - 0.10770142595265121
K=15, Silhouette Score - 0.1116385710386286
K=20, Silhouette Score - 0.10955929404079649
K=25, Silhouette Score - 0.10501640204102061
K=30, Silhouette Score - 0.10069649409900068
K=35, Silhouette Score - 0.10730642035369138
K=40, Silhouette Score - 0.09667682311234543
K=45, Silhouette Score - 0.10003631748542147
K=50, Silhouette Score - 0.10557446892408442
K=60, Silhouette Score - 0.09700226089632806
K=70, Silhouette Score - 0.10464939069185518
K=80, Silhouette Score - 0.09881345174377527
K=90, Silhouette Score - 0.10251709274797294
K=100, Silhouette Score - 0.10509426293548445

Cluster Visualization for optimal k-means

In [48]:
k=2
#Apply clustering algorithm to the data 
clusterer = KMeans(n_clusters=k,random_state=1).fit(reduced_data)
#Predict the cluster for each data point
preds = clusterer.predict(reduced_data)
#Find the cluster centers (not applicable to DB scan)
centers = clusterer.cluster_centers_
#Calculate the mean silhouette coefficient for the number of clusters chosen
score = silhouette_score(reduced_data, preds)

# Display the results of the clustering from implementation
vs.cluster_results(reduced_data, preds, centers, pca_samples)

Data Recovery

Applying the inverse operations to the data following transformations

In [49]:
#Inverse transform the centers
log_centers = pca.inverse_transform(centers)

#Exponentiate the centers and remove the 1 from earlier (used to prevent log 0 error)
true_centers = np.exp(log_centers) - 1

# Display the true centers
segments = ['Segment {}'.format(i) for i in range(0,len(centers))]
true_centers = pd.DataFrame(np.round(true_centers), columns = good_data.keys())
true_centers.index = segments
display(true_centers)
true_centers.to_csv('summary_clusters.csv')
hits_iabCat_IAB_1 hits_iabCat_IAB_10 hits_iabCat_IAB_12 hits_iabCat_IAB_13 hits_iabCat_IAB_15 hits_iabCat_IAB_17 hits_iabCat_IAB_18 hits_iabCat_IAB_19 hits_iabCat_IAB_2 hits_iabCat_IAB_21 hits_iabCat_IAB_22 hits_iabCat_IAB_3 hits_iabCat_IAB_4 hits_iabCat_IAB_5 hits_iabCat_IAB_7 hits_iabCat_IAB_9 revenue_mil_usd total_employees
Segment 0 2.0 0.0 7.0 2.0 0.0 1.0 1.0 8.0 2.0 0.0 1.0 0.0 0.0 1.0 0.0 2.0 14.0 86.0
Segment 1 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 13.0 100.0

Summary K-means

This is definitely an improvement on the benchmark model as we can at least see some form of clustering that might proivide useful insights to marketers.

Clustering DB Scan

Investigate alternative clustering methods where the number of clusters is found based on distance (epsilon)

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html

https://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto-examples-cluster-plot-dbscan-py

sklearn.cluster.DBSCAN(eps=0.5, min_samples=5, metric=’euclidean’, metric_params=None, algorithm=’auto’, leaf_size=30, p=None, n_jobs=None)[source]

In [50]:
# Compute DBSCAN
def applyDBSCAN(epsilon,data):
    db = DBSCAN(eps=epsilon, min_samples=10).fit(data)
    core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
    core_samples_mask[db.core_sample_indices_] = True
    labels = db.labels_
    # Number of clusters in labels, ignoring noise if present.
    n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
    n_noise_ = list(labels).count(-1)   
    #Results
    if (n_noise_ == len(data)) or len(set(labels)) <=1:
        score=0
    else:
        score = metrics.silhouette_score(data, labels)
    print("Epsilon= " + str(epsilon)+ ". Silhouette Score= "+str(score) + ". Clusters= "+str(n_clusters_)+" with an estimated number of noise points = "+ str(n_noise_))
In [51]:
epsilons=[0.0001, 0.001,0.01,1,2,3,4,5,6,7,8,9,10,20,30,40,50]
for i in epsilons:
    applyDBSCAN(i,reduced_data)
Epsilon= 0.0001. Silhouette Score= 0. Clusters= 0 with an estimated number of noise points = 1899
Epsilon= 0.001. Silhouette Score= 0. Clusters= 0 with an estimated number of noise points = 1899
Epsilon= 0.01. Silhouette Score= 0. Clusters= 0 with an estimated number of noise points = 1899
Epsilon= 1. Silhouette Score= 0.05563577684907343. Clusters= 1 with an estimated number of noise points = 1422
Epsilon= 2. Silhouette Score= 0.2571510599297678. Clusters= 1 with an estimated number of noise points = 186
Epsilon= 3. Silhouette Score= 0.40364274097546005. Clusters= 1 with an estimated number of noise points = 8
Epsilon= 4. Silhouette Score= 0. Clusters= 1 with an estimated number of noise points = 0
Epsilon= 5. Silhouette Score= 0. Clusters= 1 with an estimated number of noise points = 0
Epsilon= 6. Silhouette Score= 0. Clusters= 1 with an estimated number of noise points = 0
Epsilon= 7. Silhouette Score= 0. Clusters= 1 with an estimated number of noise points = 0
Epsilon= 8. Silhouette Score= 0. Clusters= 1 with an estimated number of noise points = 0
Epsilon= 9. Silhouette Score= 0. Clusters= 1 with an estimated number of noise points = 0
Epsilon= 10. Silhouette Score= 0. Clusters= 1 with an estimated number of noise points = 0
Epsilon= 20. Silhouette Score= 0. Clusters= 1 with an estimated number of noise points = 0
Epsilon= 30. Silhouette Score= 0. Clusters= 1 with an estimated number of noise points = 0
Epsilon= 40. Silhouette Score= 0. Clusters= 1 with an estimated number of noise points = 0
Epsilon= 50. Silhouette Score= 0. Clusters= 1 with an estimated number of noise points = 0

Summary K-means

Only a single cluster can be derived at best. This further validates the idea that there are not truly and distinct cluster present in the dataset

Summary & Conclusions

Using k-means to try and determine if there are distinct cluster of companies to be found within this dataset, based on their web activty on website of different categorisations, as per the IAB taxonomy, there appears to be no distinct behaviour of companies. A sillouhette score of 0.19 highlights this along with the clustering visualisation above which clusters with large overlaps.

Next Steps for the Business:

  • There are clearly some serious data quality issue here. The business needs to decide if it is worth investing in monitioring any of the blacklisted categories or a decision should be made to stop tracking activity on blacklisted IAB categorised website

  • Click data needs to be increased in order to drive any significant machine learning inferences going forward. Right now we not have significant data accross all accounts. I'd suggest we invest in serving display ads to all companies in our database to be building up a baseline for activity and help drive future analysis.

  • There are some serious outliers, which should be taken into consideration with the existing scoring models, as they could skew the entire accounts list. Highly likely they will always bubble to the top of client accounts

Further Analysis:

  • Can any inferences be made at a high level of granulairty other than company, e.g. at the country level?
  • Hierachal clustering methods to determine the number of clusters